Goto

Collaborating Authors

 machine-generated content


StyleDecipher: Robust and Explainable Detection of LLM-Generated Texts with Stylistic Analysis

Li, Siyuan, Wulianghai, Aodu, Lin, Xi, Li, Guangyan, Chen, Xiang, Wu, Jun, Li, Jianhua

arXiv.org Artificial Intelligence

With the increasing integration of large language models (LLMs) into open-domain writing, detecting machine-generated text has become a critical task for ensuring content authenticity and trust. Existing approaches rely on statistical discrepancies or model-specific heuristics to distinguish between LLM-generated and human-written text. However, these methods struggle in real-world scenarios due to limited generalization, vulnerability to paraphrasing, and lack of explainability, particularly when facing stylistic diversity or hybrid human-AI authorship. In this work, we propose StyleDecipher, a robust and explainable detection framework that revisits LLM-generated text detection using combined feature extractors to quantify stylistic differences. By jointly modeling discrete stylistic indicators and continuous stylistic representations derived from semantic embeddings, StyleDecipher captures distinctive style-level divergences between human and LLM outputs within a unified representation space. This framework enables accurate, explainable, and domain-agnostic detection without requiring access to model internals or labeled segments. Extensive experiments across five diverse domains, including news, code, essays, reviews, and academic abstracts, demonstrate that StyleDecipher consistently achieves state-of-the-art in-domain accuracy. Moreover, in cross-domain evaluations, it surpasses existing baselines by up to 36.30%, while maintaining robustness against adversarial perturbations and mixed human-AI content. Further qualitative and quantitative analysis confirms that stylistic signals provide explainable evidence for distinguishing machine-generated text. Our source code can be accessed at https://github.com/SiyuanLi00/StyleDecipher.


Evaluating Machine Expertise: How Graduate Students Develop Frameworks for Assessing GenAI Content

Chen, Celia, Leitch, Alex

arXiv.org Artificial Intelligence

This paper examines how graduate students develop frameworks for evaluating machine-generated expertise in web-based interactions with large language models (LLMs). Through a qualitative study combining surveys, LLM interaction transcripts, and in-depth interviews with 14 graduate students, we identify patterns in how these emerging professionals assess and engage with AI-generated content. Our findings reveal that students construct evaluation frameworks shaped by three main factors: professional identity, verification capabilities, and system navigation experience. Rather than uniformly accepting or rejecting LLM outputs, students protect domains central to their professional identities while delegating others--with managers preserving conceptual work, designers safeguarding creative processes, and programmers maintaining control over core technical expertise. These evaluation frameworks are further influenced by students' ability to verify different types of content and their experience navigating complex systems. This research contributes to web science by highlighting emerging human-genAI interaction patterns and suggesting how platforms might better support users in developing effective frameworks for evaluating machine-generated expertise signals in AI-mediated web environments.


RU-AI: A Large Multimodal Dataset for Machine Generated Content Detection

Huang, Liting, Zhang, Zhihao, Zhang, Yiran, Zhou, Xiyue, Wang, Shoujin

arXiv.org Artificial Intelligence

The recent advancements in generative AI models, which can create realistic and human-like content, are significantly transforming how people communicate, create, and work. While the appropriate use of generative AI models can benefit the society, their misuse poses significant threats to data reliability and authentication. However, due to a lack of aligned multimodal datasets, effective and robust methods for detecting machine-generated content are still in the early stages of development. In this paper, we introduce RU-AI, a new large-scale multimodal dataset designed for the robust and efficient detection of machine-generated content in text, image, and voice. Our dataset is constructed from three large publicly available datasets: Flickr8K, COCO, and Places205, by combining the original datasets and their corresponding machine-generated pairs. Additionally, experimental results show that our proposed unified model, which incorporates a multimodal embedding module with a multilayer perceptron network, can effectively determine the origin of the data (i.e., original data samples or machine-generated ones) from RU-AI. However, future work is still required to address the remaining challenges posed by RU-AI. The source code and dataset are available at https://github.com/ZhihaoZhang97/RU-AI.


MUGC: Machine Generated versus User Generated Content Detection

Xie, Yaqi, Rawal, Anjali, Cen, Yujing, Zhao, Dixuan, Narang, Sunil K, Sushmita, Shanu

arXiv.org Artificial Intelligence

As advanced modern systems like deep neural networks (DNNs) and generative AI continue to enhance their capabilities in producing convincing and realistic content, the need to distinguish between user-generated and machine generated content is becoming increasingly evident. In this research, we undertake a comparative evaluation of eight traditional machine-learning algorithms to distinguish between machine-generated and human-generated data across three diverse datasets: Poems, Abstracts, and Essays. Our results indicate that traditional methods demonstrate a high level of accuracy in identifying machine-generated data, reflecting the documented effectiveness of popular pre-trained models like RoBERT. We note that machine-generated texts tend to be shorter and exhibit less word variety compared to human-generated content. While specific domain-related keywords commonly utilized by humans, albeit disregarded by current LLMs (Large Language Models), may contribute to this high detection accuracy, we show that deeper word representations like word2vec can capture subtle semantic variances. Furthermore, readability, bias, moral, and affect comparisons reveal a discernible contrast between machine-generated and human generated content. There are variations in expression styles and potentially underlying biases in the data sources (human and machine-generated). This study provides valuable insights into the advancing capacities and challenges associated with machine-generated content across various domains.


From SEO To GEO: What GPT Marketers Need to Know

#artificialintelligence

If you are 25 or younger, chances are high that you never encountered the paper version of Yellow Pages but throughout the 20th century, print directories were among the primary ways for consumers and businesses to connect. Established in 1886, Yellow Pages posted its final print issue in January 2019 closing the chapter on 130 plus history of print directory marketing. In the late 1990s, the new exotic profession of online directory marketing emerged with the rise of Yahoo! and other online directories, and quickly disappeared as the search engines took over. Search engine to be precise, since Google quickly took the lion's share of the market in the early 2000s. Since then, every business is being bombarded by armies of search engine optimization (SEO) marketers offering to analyze and optimization of your websites, social networks, and all kind of tricks designed to get the business to the top of search results.


How to Use AI to Provide Seamless Email Journeys for Your Customers

#artificialintelligence

You probably had your first dose of artificial intelligence (AI) through Steven Spielberg's sci-fi films that depict robots coming to life and gaining control over the world. Though this is still a distant dream, it wouldn't come as a surprise if such advancements actually came true in the near future. As the world is moving at breakneck speed, the concept of AI is receiving a mix of emotions from all around the world. Mark Zuckerberg and Bill Gates referred to AI as a "holy grail" in the realms of computer science and technology that would go a long way in making lives easier. Now looking beyond the scenario of robots taking over the world, AI has arrived in the business landscape transforming critical operations and processes for tons of businesses worldwide.


Three Ways To Use AI And Machine Learning To Create Customers For Life

#artificialintelligence

Only a few years ago, business to business (B2B) technology companies would sell a solution to a customer, wait three to five years, then reapproach that same customer to offer a renewal or a completely new product. But these days, the initial purchase doesn't necessarily translate into a continuum of sales, and it doesn't hold the promise of customer retention like it once did. That's because today's customers have much higher expectations in order to remain loyal. Getting them to love your brand and love your products takes a customer-first mindset and a company-wide commitment to improve the customer experience. Companies that excel at customer experience are using artificial intelligence (AI) and machine learning heavily to produce immersive, authentic experiences across every customer touch point.


How AI is Impacting Content Marketing

#artificialintelligence

While there are plenty of dire-sounding discussions taking place these days around artificial intelligence (AI) and machine learning--and their potential to disrupt the world as we know it--this isn't technology of the future. New technologies are promising to upend the traditional ways in which content is conceived, produced, and disseminated. A Copyblogger article from as far back as 2015 noted that both Forbes and the Associated Press were producing machine-generated content. These examples are likely to both thrill and chill content marketers, depending on where they're perched along the content creation continuum--including the need to generate an increasing volume of content and to make a living from creating that content. For now, though, there is fortunately less to fear than there is to cheer, says Natalia Markova, senior web content strategist with Jellyfish, a global digital agency.